Automatic Text Summarization Using Lexical Chains: Algorithms and Experiments
نویسندگان
چکیده
Summarization is a complex task that requires understanding of the document con tent to determine the importance of the text. Lexical cohesion is a method to identify connected portions of the text based on the relations between the words in the text. Lexical cohesive relations can be represented using lexical chains. Lexical chains are sequences of semantically related words spread over the entire text. Lexical chains are used in variety of Natural Language Processing (NLP) and Information Retrieval (IR) applications. In current thesis, we propose a lexical chaining method that includes the glossary relations in the chaining process. These relations enable us to identify topically related concepts, for instance dormitory and student, and thereby enhances the identification of cohesive ties in the text. We then present methods that use the lexical chains to generate summaries by extracting sentences from the document(s). Headlines are generated by filtering the portions of the sentences extracted, which do not contribute towards the meaning of the sentence. Headlines generated can be used in real world application to skim through the document collections in a digital library. Multi-document summarization is gaining demand with the explosive growth of online news sources. It requires identification of the several themes present in the collection to attain good compression and avoid redundancy. In this thesis, we propose methods to group the portions of the texts of a document collection into meaningful clusters. Clustering enable us to extract the various themes of the document collection. Sentences from clusters can then be extracted to generate a summary for the multi-document collection. Clusters can also be used to generate i summaries with respect to a given query. We designed a system to compute lexical chains for the given text and use them to extract the salient portions of the document. Some specific tasks considered are: headline generation, multi-document summarization, and query-based summariza tion. Our experimental evaluation shows that efficient summaries can be extracted for the above tasks.
منابع مشابه
Using Genetic Algorithms with Lexical Chains for Automatic Text Summarization
Automatic text summarization takes an input text and extracts the most important content in the text. Determining the importance of information depends on several factors. In this paper, we combine two different approaches that have been used in the text summarization domain. The first one is using genetic algorithms to learn the patterns in the documents that lead to the summaries. The other o...
متن کاملAn Automatic Text Summarization Using Lexical Cohesion and Correlation of Sentences
Due to substantial increase in the amount of information on the Internet, it has become extremely difficult to search for relevant documents needed by the users. To solve this problem, Text summarization is used which produces the summary of documents such that the summary contains important content of the document. This paper proposes a better approach for text summarization using lexical chai...
متن کاملComputing Lexical Chains for Automatic Arabic Text Summarization
Automatic Text Summarization has received a great deal of attention in the past couple of decades. It has gained a lot of interest especially with the proliferation of the Internet and the new technologies. Arabic as a language still lacks research in the field of Information Retrieval. In this paper, we explore lexical cohesion using lexical chains for an extractive summarization system for Ar...
متن کاملLexical Cohesion Based Topic Modeling for Summarization
In this paper, we attack the problem of forming extracts for text summarization. Forming extracts involves selecting the most representative and significant sentences from the text. Our method takes advantage of the lexical cohesion structure in the text in order to evaluate significance of sentences. Lexical chains have been used in summarization research to analyze the lexical cohesion struct...
متن کاملEfficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization
While automatic text summarization is an area that has received a great deal of attention in recent research, the problem of efficiency in this task has not been frequently addressed. When considering the size and quantity of documents available on the Internet and from other sources, the need for a highly efficient tool that produces usable summaries is clear. We present a linear time algorith...
متن کامل